Exploratory compositional data analysis using the R-package robCompositions

نویسندگان

  • K. Hron
  • M. Templ
  • P. Filzmoser
چکیده

Compositional data are multivariate observations that carry only relative information. This means that not the absolute values but the ratios between the variables are of interest. This is important also for an exploratory analysis of such data. We present two basic methods for the exploratory compositional data analysis (ECDA), namely multivariate outlier detection and the compositional biplot. The methods are illustrated at a small data example using the R package robCompositions. 1 Compositional data In practice, data frequently consist of percentages or, more general, not the absolute values but the ratios between the variables are of interest. Usually, this kind of observations is characterized with a positive constant sum constraint of variables (usually 1 or 100 in the case of proportions or percentages, respectively), however, this condition is obviously not necessary. Nowadays, multivariate observations that represent quantitative descriptions of the parts of some whole, conveying exclusively relative information, are known under the term compositional data or compositions for short. Obviously, the D-part composition x = (x1, . . . , xD) ′ and its positive real multiple cx, c > 0, convey essentially the same information. The sample space of compositions is a D-part simplex, a (D − 1)-dimensional subset of RD−1 that contain all D-part compositions that sum up to a prescribed constant sum constraint. The nature of compositions claim for a special geometry, called nowadays the Aitchison geometry with special operations of perturbation, power transformations and the Aitchison inner product with the usual Hilbert space properties [5]. The name of the geometry comes according to John Aitchison, a British statistician that proposed the first comprehensive theory for statistical analysis of compositional data [1]. A special treatment for compositions is necessary because of the different sample space. Thus, the usual statistical methods cannot be applied directly to compositions as they are designed for the Euclidean sample space, where the information is absolute and not relative. As a way out, J. Aitchison proposed the family of log-ratio transformations from the

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Imputation of Missing Values in Compositional Data Using the -Package robCompositions

The aim of this contribution is to show how the R-package robCompositions can be applied to estimate missing values in compositional data. Two procedures are summarized, one of them being highly stable also in presence of outlying observations. Measures for information loss are presented, and it is demonstrated how they can be applied. Moreover, we introduce new diagnostic tools that are useful...

متن کامل

The Use of Robust Factor Analysis of Compositional Geochemical Data for the Recognition of the Target Area in Khusf 1:100000 Sheet, South Khorasan, Iran

The closed nature of geochemical data has been proven in many studies. Compositional data have special properties that mean that standard statistical methods cannot be used to analyse them. These data imply a particular geometry called Aitchison geometry in the simplex space. For analysis, the dataset must first be opened by the various transformations provided. One of the most popular of the a...

متن کامل

Enabling Event-data Analysis in R - Demonstration

This demonstration introduces a newly developed R-package, named edeaR Exploratory and Descriptive Event-based Data Analysis in R. The package aims to handle, describe and select event data using a set of predefined methods. Consequently, it enables the vast collection of data manipulation and analysis functionalities within R to be used for event data.

متن کامل

GeoXp: An R Package for Exploratory Spatial Data Analysis

We present GeoXp, an R package implementing interactive graphics for exploratory spatial data analysis. We use data bases coming from the spdep package to illustrate the use of these exploratory techniques based on the coupling between a statistical graph and a map. Besides elementary plots like boxplots, histograms or simple scatterplots, GeoXp also couples maps with Moran scatterplots, variog...

متن کامل

Application of compositional data analysis to geochemical data of marine sediments

In an earlier investigation (Burger et al., 2000) five sediment cores near the Rodrigues Triple Junction in the Indian Ocean were studied applying classical statistical methods (fuzzy c-means clustering, linear mixing model, principal component analysis) for the extraction of endmembers and evaluating the spatial and temporal variation of geochemical signals. Three main factors of sedimentation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010